home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
HPAVC
/
HPAVC CD-ROM.iso
/
CODBRK3.ZIP
/
cb0204.txt
< prev
next >
Wrap
Text File
|
1998-03-25
|
12KB
|
310 lines
Back To The Basics
by SPo0ky
I wrote this tutorial because some beginners who read the first two
editions of our magazine told us that they have problems understanding the
basics of assembly... In this tutorial I'll not try to teach you any virus
techniques, I'll only try to explain the most basic things of assembly like
how tasm/tlink work, registers, memory, interrupts,... as simple as
possible.
Maybe this sounds boring to you but if you want to code your own viruses
you have to understand the basics.
Also this article will not fully teach you assembly! It will only help you
to understand the basics. To understand assembly you will need at least a
few weeks with good training (-programming). I also suggest you to buy a
good book about Assembler and to read many many many virus source codes
(Thats the way I used to learn Assembler).
(Also - If you don't have a bookstore in your town you can use the online
bookstore at http://intertain.com, they have many great books, and they
are fast and cheap!)
1. Why Assembler?
There are some pros and cons why you should (should not) use Assembler.
Contrary to HLL (High Level Languages), like Pascal, Basic or C++, in
Assembler you have to tell the CPU each step it has to execute,
which means that to write a big (complex) Assembler program is very time
consuming. Thats why most of the time, big programs are not completely
written in Assembler but Assembler parts are included in HLL-programs.
Another con of Assembler is that programs can not be used on other brands of
CPUs which they were written on because each brand of CPU has another
instruction set (We will use the 80x86 instruction set, which is used in
the IBM-PC and compatibles), but this gives you the possibility to optimize
the code for one specific CPU so that you can use all of its capabilities.
The result is extremely FAST and SMALL code.
2. A simple assembly program
Lets start with a short Assembler program. You don't have to understand
what each instruction is used for now, I'll explain that later.
Just type this program into an ascii editor (-edit.com) and save it as
example.asm.
.model tiny
.code
org 100h
start:
mov ah,9h
mov dx,offset message
int 21h
mov ax,4c00h
int 21h
message db 'CodeBreakers Rule! ;-)',10,13,'$'
end start
3. Assembler and Linker (TASM and TLINK)
Before I continue I'll show you how to use TASM and TLINK to compile such
a file to an EXE or COM program.
After you saved the above program into a file (example.asm) you can type
TASM EXAMPLE1.ASM
This will generate a so called OBJECT-FILE named EXAMPLE.OBJ.
Generally this file contains only information for the linker and a
translated version of the above code into binary (machine code).
Example:
MOV AH,9h
would be translated into 1011 0100 0000 1001
This Object file is not executable yet, to make an executable file (COM or
EXE) we have to use a LINKER (TLINK).
This linker will stick one or more object files together to one executable
file.
To link the example1.obj file you just type:
TLINK /t EXAMPLE1.OBJ
The /t switch tells the linker that it should produce a COM file, if you
leave /t away you will get an EXE file. Anyway, now you should have a ready
to run COM file... Just type EXAMPLE to start it!
4. Registers
Registers are extremely fast accessible memory cells in the CPU, they are
used to address memory, to give instructions to the CPU,... generally they
are used to store "values".
All registers can store 16 bits (= 2 bytes) of data and some registers can
be split into two 8 bit (= 1 byte) registers.
Registers of the 8086 CPU's:
+--------------------+
| AH | AL > AX | -> Accumulator Register
| BH | BL > BX | -> Base Register
| CH | CL > CX | -> Count Register
| DH | DL > DX | -> Data Register
+-----+--------------+
| SI | -> Source Index
| DI | -> Destination Index
+--------------------+
| BP | -> Base Pointer
| SP | -> Stack Pointer
+--------------------+
| CS | -> Code Segment
| DS | -> Data Segment
| ES | -> Extra Segment
| SS | -> Stack Segment
+--------------------+
| IP | -> Instruction Pointer
+--------------------+
| F | -> Flag-Register
+--------------------+
Only the registers AX, BX, CX and DX can be split into two parts:
AH, AL, BH, BL, CH, CL, DH, DL. Each of them has only 8 bits instead of
16!
BTW - 8 bits are called a BYTE
16 bits are called a WORD
AH, AL, BH, BL, and so on, are called byte register and all others
(AX, BX, CX, DX,...) are called WORD registers.
5. MOV(e)
Assembler (or the CPU) provides many functions which allow you to
manipulate (to change) the data stored in a register. One of the most
important instructions used to manipulate a register is MOV.
Look back to our example program... we used MOV 3 times (MOV AH,9 /
MOV DX, OFFSET MESSAGE / MOV AX,4C00H).
You can change the data in a register by using: MOV <REG>, <DATA>. Where
<REG> is the 16 or 8 bit register you want to change, and <DATA> is the
data you wan to store in the register.
But MOV can't be only used to change the data of registers, it can also
be used to change data stored at a certain location in MEMORY.
6. MEMORY
When you execute our little example program it just displays text... now,
how can the computer know WHICH text it should display? Again, look at the
example programs source code:
MOV AH, 9
MOV DX, OFFSET MESSAGE
INT 21H
.
.
MESSAGE DB 'Some text...$'
The first line is used to tell the CPU that it should display text
(function 9 in AH is used to display text). And the second line tells the
CPU where it has to look for the text in memory.
Each byte in memory gets a number (an address), so the CPU knows exactly
which byte it has to read/write.
In the second line, "OFFSET MESSAGE" would return the address where the
MESSAGE is stored in memory and store it in DX (To display text MS-DOS
requires that the address of the text is stored in the register DX!).
Some examples:
Let's say this is our memory:
Offset: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Data: | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P |
We want to get the data which is stored at offset 6 into register AH.
Which instruction would be used?
-> MOV AH, [6]
This would put the data at offset 6 (= 'F') into register AH. The '[' and
']' are very important, If you forget them it would put the number 6 into
AH instead of 'F'!
Remember, AH is a 8 bit register (-> It can store only 1 byte or 8 bit),
what would happen if we'd use AX (which is a 16 bit register) instead of
AH?
-> MOV AX, [6]
AX would become 'GF'. Yes, not 'FG'! In the x86's everything you read from
memory into a word register is turned around! (Thats not that important for
you yet, but you should know it anyway...)
7. Interrupts
I needed a very long time until I found a (hopefully) good way to explain
interrupts to a newbie! Finally I decided to use a simple example, the
MS-DOS Prompt.
When you are at the MS-DOS prompt you can enter commands, after you press
RETURN the command gets executed. You could compare the pressing of the
RETURN key with an interrupt. In assembler you fill the registers with
values, then you execute the interrupt. The interrupt code would then
evaluate the values you put into the register, it would decide which
function it should execute, .... and finally it would return the results
(in a register, on the screen or on your hdd,...)
Ex.:
MOV AH,9
MOV DX,OFFSET MESSAGE
INT 21H
The first two lines of this example have already been explained above,
the 3rd line would execute an interrupt, interrupt 21h(ex).
Interrupts are numbered from 0 to FFh, each interrupt provides other types
of 'services'. Like
INT 21H, this is the MS-DOS interrupt, it provides basic DOS functions,
like input/output of text, file functions (like open, read, write to
files).
INT 13H is the BIOS interrupt, it provides many Disk access functions
like reading/writing/formatting of disk sectors.
INT 10H is the video interrupt, this interrupt allows you to use many
functions to make nice graphics :-) It provides functions to change the
video mode, to draw pixels onto the screen, to change the color of text,
and so on...
For a list of all interrupts and their functions download Ralf Browns
Interrupt List from http://www.cs.cmu.edu/afs/cs.cmu.edu/user/ralf/pub/WWW/
8. The sample program, step-by-step
.model tiny
-----------
This is not code which will later be executed... it just tells TASM/TLINK
that they should use one segment for the whole program. There are also
.model small, .model huge, etc,... but for small programs (= simple
viruses ;) model tiny is enough.
.code
-----
This tells TASM and TLINK that our executable code begin here. After this
like we can begin to write our main program.
org 100h
--------
All COM files are loaded into memory at offset 100h. ORG tells the
compiler 'where to store the code in memory'.
start:
------
This is just a lable which is required by TASM...
MOV AH, 9
---------
We want to display text... we will use the dos interrupt to do so.
INT 21h requires that we put the function number into register AH. So, to
tell the CPU that we want to display some text we 'MOV(E)' the number of
the function used to display text (9) into register AH.
MOV DX, OFFSET MESSAGE
----------------------
The CPU needs to know where to find the text it should display... If we
use INT 21H we have to store this location in register DX. To do so we
just get the address of the message with 'OFFSET MESSAGE' and move it into
DX.
INT 21H
-------
Now that we have 'collected' enough information (filled the registers
with many stupid numbers) we can execute the interrupt, which will finally
get the CPU to display some text for us. :-)
MOV AX,4C00H
------------
Never forget the last two lines of this code! They are used to exit a
program. If you forget them, your program will crash.
DOS uses the function 4C to exit programs, 00 means that we will not
return an ¿Error Code¿ (exit without an error).
INT 21H
-------
Now INT 21H will execute the function to exit the program...
message db 'CodeBreakers Rule! ;-)',10,13,'$'
------------------------------------------------
This is the message we WANT to display :-)
'10,13' does the same as pressing return, it puts the cursor into the next
line.
'$', this sign doesn't mean 'fast money' :) ... somehow DOS needs to know
where to stop displaying text, Bill G. decided to use '$'.
end start
---------
This indicates the end of the label 'start', also required by TASM.
btw - This doesn't exit the program! To exit the program you still have to
use function 4C with INT 21h.
Ok, thats all for now... I know that this is a very basic tutorial, and
that it wasn't written very well... but I hope that it answered at least
a few of your questions! If you have any further questions feel free to use
the message board on our homepage at www.codebreakers.org, or email me at
spo0ky@thepentagon.com (spo0ky with zero! :)... Maybe I'll write a FAQ with
specific questions for the 4th edition.
--SPo0ky